210 research outputs found
Generative modelling and adversarial learning
University of Technology Sydney. Faculty of Engineering and Information Technology.A main goal of statistics and machine learning is to represent and manipulate high-dimensional probability distributions of real-world data, such as natural images. Generative adversarial networks (GAN), which are based on the adversarial learning paradigm, are one of the main types of methods for deriving generative models from complicated real-world data. GAN and its variants use a generator to synthesise semantic data from standard signal distributions and train a discriminator to distinguish real samples in the training dataset from fake samples synthesised by the generator. As a confronter, the generator aims to deceive the discriminator by producing ever more realistic samples. Through a two-player adversarial game played by the generator and discriminator, the generated distribution can approximate the real-world distribution and generate samples from it.
This thesis aims to both improve the quality of generative modelling and manipulate generated samples by specifying multiple scene properties. A novel framework for training GAN is proposed to stabilise the training process and produce more realistic samples. Unlike existing GANs, which alternately train a generator and a discriminator using a pre-defined adversarial objective function, different adversarial training objectives are utilised as mutation operations and train a population of generators to adapt to the environment (i.e. the discriminator). The samples generated by different iterations of generators are evaluated and only well-performing generators are preserved and used for further training. In this way, the proposed framework overcomes the limitations of an individual adversarial training objective and always preserves the best offspring, contributing to the progress and success of GANs.
Based on the GANs framework, this thesis devised a novel model, called a perceptual adversarial network (PAN). The proposed PAN consists of two feed-forward convolutional neural networks: a transformation network and a discriminative network. Besides generative adversarial loss, which is widely used in GANs, this thesis proposes to employ perceptual adversarial loss, which undergoes adversarial training between the transformation network and hidden layers of the discriminative network. The hidden layers and output of the discriminative network are upgraded to constantly and automatically discover discrepancies between a transformed image and the corresponding ground truth, and the image transformation network is trained to minimise the discrepancy identified by the discriminative network.
Furthermore, to extend the generative models to perform more challenging re-rendering tasks, this thesis explores disentangled representations encoded in real-world samples and proposes a principled tag disentangled generative adversarial network for re-rendering new samples of the object of interest from a single image by specifying multiple scene properties. Specifically, from an input sample, a disentangling network extracts disentangled and interpretable representations, which are then used to generate new samples using the generative network. In order to improve the quality of the disentangled representations, a tag mapping net determines the consistency between the image and its tags.
Finally, experiments with different challenging datasets and image synthesis tasks demonstrate the good performance of the proposed frameworks regarding the problem of interest
A User-Centered Concept Mining System for Query and Document Understanding at Tencent
Concepts embody the knowledge of the world and facilitate the cognitive
processes of human beings. Mining concepts from web documents and constructing
the corresponding taxonomy are core research problems in text understanding and
support many downstream tasks such as query analysis, knowledge base
construction, recommendation, and search. However, we argue that most prior
studies extract formal and overly general concepts from Wikipedia or static web
pages, which are not representing the user perspective. In this paper, we
describe our experience of implementing and deploying ConcepT in Tencent QQ
Browser. It discovers user-centered concepts at the right granularity
conforming to user interests, by mining a large amount of user queries and
interactive search click logs. The extracted concepts have the proper
granularity, are consistent with user language styles and are dynamically
updated. We further present our techniques to tag documents with user-centered
concepts and to construct a topic-concept-instance taxonomy, which has helped
to improve search as well as news feeds recommendation in Tencent QQ Browser.
We performed extensive offline evaluation to demonstrate that our approach
could extract concepts of higher quality compared to several other existing
methods. Our system has been deployed in Tencent QQ Browser. Results from
online A/B testing involving a large number of real users suggest that the
Impression Efficiency of feeds users increased by 6.01% after incorporating the
user-centered concepts into the recommendation framework of Tencent QQ Browser.Comment: Accepted by KDD 201
Cross-Modal Contrastive Learning for Robust Reasoning in VQA
Multi-modal reasoning in visual question answering (VQA) has witnessed rapid
progress recently. However, most reasoning models heavily rely on shortcuts
learned from training data, which prevents their usage in challenging
real-world scenarios. In this paper, we propose a simple but effective
cross-modal contrastive learning strategy to get rid of the shortcut reasoning
caused by imbalanced annotations and improve the overall performance. Different
from existing contrastive learning with complex negative categories on coarse
(Image, Question, Answer) triplet level, we leverage the correspondences
between the language and image modalities to perform finer-grained cross-modal
contrastive learning. We treat each Question-Answer (QA) pair as a whole, and
differentiate between images that conform with it and those against it. To
alleviate the issue of sampling bias, we further build connected graphs among
images. For each positive pair, we regard the images from different graphs as
negative samples and deduct the version of multi-positive contrastive learning.
To our best knowledge, it is the first paper that reveals a general contrastive
learning strategy without delicate hand-craft rules can contribute to robust
VQA reasoning. Experiments on several mainstream VQA datasets demonstrate our
superiority compared to the state of the arts. Code is available at
\url{https://github.com/qizhust/cmcl_vqa_pl}
MagicFusion: Boosting Text-to-Image Generation Performance by Fusing Diffusion Models
The advent of open-source AI communities has produced a cornucopia of
powerful text-guided diffusion models that are trained on various datasets.
While few explorations have been conducted on ensembling such models to combine
their strengths. In this work, we propose a simple yet effective method called
Saliency-aware Noise Blending (SNB) that can empower the fused text-guided
diffusion models to achieve more controllable generation. Specifically, we
experimentally find that the responses of classifier-free guidance are highly
related to the saliency of generated images. Thus we propose to trust different
models in their areas of expertise by blending the predicted noises of two
diffusion models in a saliency-aware manner. SNB is training-free and can be
completed within a DDIM sampling process. Additionally, it can automatically
align the semantics of two noise spaces without requiring additional
annotations such as masks. Extensive experiments show the impressive
effectiveness of SNB in various applications. Project page is available at
https://magicfusion.github.io/
FakeCLR: Exploring Contrastive Learning for Solving Latent Discontinuity in Data-Efficient GANs
Data-Efficient GANs (DE-GANs), which aim to learn generative models with a
limited amount of training data, encounter several challenges for generating
high-quality samples. Since data augmentation strategies have largely
alleviated the training instability, how to further improve the generative
performance of DE-GANs becomes a hotspot. Recently, contrastive learning has
shown the great potential of increasing the synthesis quality of DE-GANs, yet
related principles are not well explored. In this paper, we revisit and compare
different contrastive learning strategies in DE-GANs, and identify (i) the
current bottleneck of generative performance is the discontinuity of latent
space; (ii) compared to other contrastive learning strategies,
Instance-perturbation works towards latent space continuity, which brings the
major improvement to DE-GANs. Based on these observations, we propose FakeCLR,
which only applies contrastive learning on perturbed fake samples, and devises
three related training techniques: Noise-related Latent Augmentation,
Diversity-aware Queue, and Forgetting Factor of Queue. Our experimental results
manifest the new state of the arts on both few-shot generation and limited-data
generation. On multiple datasets, FakeCLR acquires more than 15% FID
improvement compared to existing DE-GANs. Code is available at
https://github.com/iceli1007/FakeCLR.Comment: Accepted by ECCV202
- …